perm filename ORAL[TLK,DBL] blob sn#228494 filedate 1976-08-04 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00025 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.!XGPCOMMANDS←"/TMAR=50/PMAR=2100/BMAR=50"
C00005 00003	.S(Introduction)
C00011 00004	.S(Discovering the Primes)
C00018 00005	.S(We Did a Problem-Reduction)
C00023 00006	.S(This General Sequence of Reductions)
C00031 00007	.S(Introduce Heuristic Guidance)
C00035 00008	.S(We Write a Theory Formation Pgm)
C00038 00009	.S(Mathematical Theory Formation)
C00043 00010	.S(Writing AM)
C00054 00011	.S(What AM Does)
C00063 00012	.S(Action 1: Fill in entries of facets)
C00066 00013	.S(Action 2: Creating new concepts)
C00068 00014	.S(Action 3: Proposing a new job)
C00079 00015	.S(Locating Relevant Heuristics)
C00089 00016	.S(Cardinality Example)
C00093 00017	.S(Finding Examples of Equality)
C00100 00018	.S(Generalizing Equality)
C00103 00019	.S(Example 2)
C00108 00020	.S(Results)
C00116 00021	.S(Experiments With AM)
C00123 00022	.S(Maximally-Divisible Numbers)
C00129 00023	.S(Conclusions)
C00132 00024	.S(Supplement)
C00137 00025
C00138 ENDMK
C⊗;
.!XGPCOMMANDS←"/TMAR=50/PMAR=2100/BMAR=50";
.DEVICE XGP
.FONT 1 "BDR40"
.FONT 2 "NGR40"
.FONT 3 "FIX25"
.FONT 4  "BDI40"
.FONT 5  "BDR66"
.FONT 6  "NGR25"
.FONT 7  "NGR20"
.FONT 8 "SUP"
.FONT 9 "SUB"
.TURN ON "↑α↓_π[]{"
.TURN ON "⊗" FOR "%"
.TURN ON "@" FOR "%"
.PAGE FRAME 41 HIGH 83 WIDE
.AREA TEXT LINES 1 TO 39
.AREA FOOTING LINE 41
.COUNT PAGE PRINTING "1"
.TABBREAK
.ODDLEFTBORDER←EVENLEFTBORDER←1000
.AT "<<" ENTRY ">" ⊂ 
⊗4<Still to do: ENTRY⊗*
. ⊃
.MACRO B ⊂ BEGIN VERBATIM GROUP ⊃
.MACRO E ⊂ APART END ⊃
.MACRO FAS ⊂ FILL ADJUST SINGLE SPACE PREFACE 1 ⊃
.MACRO S(ENTRY) ⊂ FAS SKIP 2
.IF LINES≤10 THEN START SKIP TO COLUMN 1 END;
.ONCE CENTER SELECT 5
*** ENTRY ***
. ⊃
.FAS
.COMPACT
.SELECT 1
.PORTION THESIS
.PAGE←-1
.NEXT PAGE
.INDENT 0
.TURN OFF "{∞→}"   
.S(Introduction)

This talk is about one particular heuristic program, called AM, which
emulates  some  of  the  activities  involved  in  doing  mathematics
research: namely, AM investigates  a given collection of objects  and
operators, and gradually enlarges it  by defining new concepts.  This
is  accomplished  under the  guidance of  a  large body  of heuristic
rules.

But before discussing  AM, let's spend  a few minutes thinking  about
how scientific discoveries are made; at least, about how they're made
in very elementary mathematics.

We'll gather a few hints from that analysis, and use them to  propose
a simple ⊗4model⊗* of how mathematical theories are developed.

At that time  we can turn  to AM, and view  it as a  computer program
embodying that model.

The first thing  I want to stress is the difference between solving a
given problem  and  thinking it  up in  the  first place.    Contrast
playing monopoly,  to inventing new  games of  the same quality.   Or
contrast  solving the missionaries and cannibals problem, against the
ill-defined reasoning which led to formulating it.

Turning  to mathematics,  If  I  show  you the  definition  of  prime
numbers, <SLIDE primes> you can find some for me without any trouble.
But whoever first  formulated that  concept performed  a much  harder
task.

.S(Discovering the Primes)


How did he do it?  How can we rationally account for the discovery of
prime  numbers?  We'd like  to propose a plausible  scenario in which
someone decides to  look at these numbers,  and then decides  they're
interesting.

Imagine that  a bright undergraduate  math major has been  given some
formal  system of objects and operators,  which he doesn't realize is
isomorphic to set, set-operations, numbers, counting, and arithmetic.
He's  told  to  EXPLORE that  formal  system,  finding  out some  new
relationships between those  concepts which  were given  to him,  and
encouraged to  define  some new  concepts if  they  appear useful  or
interesting to him.  This student will also be called the RESEARCHER,
since he's doing what I mean by math research.

In that context, let me suggest to you one plausible scenario  of the
discovery of prime numbers.

Suppose  the researcher  has just  been  playing with  the notion  of
"divisors-of"  of a number.   One rule of  thumb he sometimes follows
when devising new theories  is to ⊗4consider extreme  cases⊗*. <SLIDE
goto extremes> In  more precise terms, this rule  says that if you've
just been studying an interesting function f which takes elements  of
set A into  those of set B, and  there is some extremal subset  of B,
it's worth taking the time to consider which elements from A map into
that distinguished subset.

In the  current situation,  the  function f  is "Divisors-of".    For
example, <SLIDE: Heur overlay> it takes the number 12 into the set of
numbers which divide evenly into 12, namely {1,2,3,4,6,12}. So f maps
a number into a set of numbers.  An interesting  or extreme subset of
B  would  thus be  some  extreme kind  of  set.   We  have names  for
extremely  small   sets,  for   example:  empty   sets,   singletons,
doubletons, tripletons.    The rule  of thumb  thus  says it's  worth
considering the  set of numbers over here  which map into empty sets;
it's worth isolating those numbers which map into singletons; the set
of numbers whose set of divisors  is a doubleton; numbers whose image
under the function divisors-of is a tripleton.

In  other words,  this rule has  suggested defining  and studying the
following classes of  numbers: the set  of numbers with no  divisors,
with only  1 divisor,  with 2 divisors  <SLIDE n divisors>,  and with
three divisors.    A  priori, any  or  all  of these  sets  could  be
interesting.  It will turn out that ⊗4this⊗*  set (2 divisors) is the
most interesting and useful one. Why?

Just the knowing  their definitions is no help.  The researcher might
take the trouble  to examine all  the numbers  less than a  thousand,
let's say,  and place them in  the appropriate set, depending  on how
many divisors they had. Based on this empirical evidence, he might be
able to perceive some regularity in one of these kinds of numbers.

Well, these two  sets of numbers are  very small and there's  nothing
too interesting about  them.  Now we turn our  attention to these two
sets of  numbers...   Nothing obviously  interesting about  these  (2
divisors), but these (3 divisors) seem to all  be perfect squares.  A
natural question  to ask is: what are  square-roots of these numbers?
We then notice that all the members here just seem to be  the squares
of the members from  this set.  This may cause us  to spend some more
time studying both these sets.

After  examining some more empirical data,  the researcher may notice
that all numbers seem to factor uniquly into numbers  drawn from this
set.   This is far from trivial,  and we'll go into how  he does this
later on.  He'll conjecture  the unique factorization theorem  <SLIDE
UFT> and notice that it can be made much less awkward if we give this
set a name, <SLIDE: Primes overlay> say PRIME NUMBERS.

Once  he's got the definition  of primes, and he's  got a couple good
reasons for giving them a  name, theorems involving primes, we  shall
say that he's actually ⊗4discovered⊗* them.

.S(We Did a Problem-Reduction)

We haven't completely  accounted for the discovery of  prime numbers,
we've just reduced  our problem from "how in the world could somebody
invent the notion of prime  numbers" to the somewhat simpler task  of
"how in the  world could somebody invent the  notion of divisors-of".
<SLIDE Reduc 1> The reduction was accomplished by citing a heuristic,
which said:

⊗4"Investigate the inverse image of unusual or extreme subsets of the
range of some operation."⊗*

In this case, the  extreme subset was "doubletons", and the operation
was divisors-of.  The  inverse image of doubletons  turned out to  be
the set of numbers we know as the primes.

"Looking for extreme cases" is a pretty general heuristic.

It's not a rule of  inference, it's just a rule of  thumb; it doesn't
guarantee  anything, in  the sense  that  modus ponens  guarantees to
preserve validity,  or  in  the  sense that  Waltz's  constraints  on
shadowing line drawings were absolute.

This  heuristic rule  certainly isn't  applicable or  useful all  the
time.    And yet  it's  worth remembering  and using  because,  in my
experience, it frequently leads to interesting, valuable new concepts
to  investigate.     In  the  long  run,   using  this  heuristic  is
cost-effective.

A natural and fair question, now, is "OK, so you've rationalized  the
discovery of prime numbers, but why  was the researcher investigating
the  concept `divisors-of' to  begin with?".   Have we  really gotten
anywhere?  If "divisors-of" was one of the primitive conepts  we gave
that researcher  to begin with,  then the answer  is Yes, we  have in
fact explained  how he might have discovered primes.  But assume that
we didn't give him divisors-of, just simpler arithmetic operations.

Maybe  we use  the  same  technique  to explain  why  the  researcher
proposed this concept,  divisors-of.  We'd like to  grab some rule of
thumb and use it to rationalize that discovery.

But  look  at  the  definition  of  divisors-of(n)  <SLIDE:  Defn  of
Divisors-of>. It's  very closely  related to  the ⊗4inverse⊗* of  the
function  Times.    A  plausible  heuristic  tactic <SLIDE:  Look  at
inverse> is to investigate the inverse of each interesting operation.
So that  rule would account  for this  discovery, reducing it  to the
discovery  of the concept  of multiplication, because  looking at the
inverse of multiplication  would lead to  defining and studying  this
concept,  divisors-of.    Actually,  one  also  has  to  notice  that
multiplication is  ⊗4associative and  commutative⊗*, i.e,  it can  be
viewed as taking a ⊗4bag⊗* or multiset of numbers as its argument.

If you  can stand it,  we might try  to continue this  reduction even
further.  How  could we rationally explain how ⊗4this⊗* concept might
have been discovered.  You might consider  discovering multiplication
as  a  numeric  analog  to the  operation  CARTESIAN-product,  or  as
repeated  addition.   Addition could have  been found  as the numeric
analogue of  union, or as  repeated successoring.   Successor is  the
numeric analogue of CONS or insertion.

.SELECT 1

.S(This General Sequence of Reductions)

<chain SLIDE>  We've performed  a sequence  of reductions,  <Overlay:
arrows down> from primes to divisors-of to multiplication to addition
to counting to set-insertion.   Each step was justified by some  rule
of thumb: Here  it is "consider extremals", here it  is "consider the
inverse  of an interesting relation", here and  here it is "repeat an
interesting operation".

We could continue this analysis, but at some point  the concepts we'd
have to deal with get  so elementary that we'd feel it makes no sense
to talk  about "discovering"  them.   At  some  point, they  are  the
primitive concepts we initially provided the student researcher with.

I BELIEVE THAT  when we hear of a  new discovery, we mentally  try to
analyze it in this  fashion ourselves.  We try to perceive a chain of
concepts, stretching from  that ⊗4latest⊗*  discovery backwards,  not
all the way to our conceptual primitives, but at least all the way to
concepts we  already are familiar with.  If we succeed right away, we
probably think that the discovery wasn't so hard, that  we could have
done it ourselves.


<Remove  down-arrow overlay> Very  often, a  new discovery  is really
just a couple  steps in this  kind of process,  one or two  heuristic
applications away from what is already known.  That is why it's often
easy for us to believe that there wasn't really so much effort needed
to make the  discovery.  If  we already know  this concept, and  this
heuristic, it almost seems obvious  once we've heard that someone has
taken this step.

Why  then don't we make  new discoveries every few  minutes?  Why are
even some 1-step advances worth publishing?  The answer  can be found
by seeing  how (according to this model)  very simple new discoveries
are made.

We're talking now about reversing the flow of time here.  Instead  of
analyzing a discovery by  breaking it into simpler and  simpler ones,
we're  talking about  the synthesis of  new concepts,  ⊗4the activity
which  the  researcher  was  carrying  out  all  along⊗*.     He  was
continually trying  to grow  new nodes on  this tree,  using whatever
heuristic rules he's learned over the years.

The researcher has a bunch of concepts that he knows about, say a few
thousand.  I  won't even consider  his ⊗4legal⊗* choices, since  they
are  quite  astronomical in  number.   Say  he  knows  a few  hundred
heuristics he can apply.   In any situation,  then, he may  literally
have a  million  alternatives to  consider, <OVERLAY  of red  lines>.
These are not his ⊗4legal⊗* choices, but his ⊗4plausible⊗* ones; each
and every one is justified by some heuristic, like this link is.

But nature is  not kind to us,  and only a  very small percentage  of
those new  concepts will turn  out to be  interesting.  What  is even
worse, he  can't whiz through this space, stopping at the interesting
concepts, because it may require hours -- sometimes even  years -- of
research to decide whether the concept he's studying is worthwhile or
a dead-end.

Even here, where we have just a few concepts and heuristics, the tree
gets bushy  when we try  to go  in this upward  direction.  We  could
apply  "looking at  inverse" here,  (CARTESIAN-product) to  decide to
investigate the kind  of operation we normally  call projection.   Or
here (divisors-of) even if we decide to look for extremals, which way
do we  go?  Why not examine numbers which have very many, rather than
very  few, divisors?    Why  not  keep creating  new  operations,  by
repeating the  last one created,  thereby discovering exponentiation,
hyper-exponentiation, and so on?

The analysis procedure was a search for a path from a given node back
to primitives, but the  synthesis procedure is blind, doesn't  have a
well-specified goal, so it is combinatorially more explosive.

So it's  harder to MAKE a valuable  new discovery than to rationalize
how someone might plausibly have discovered something he's  just told
you about.   This explains why discoveries sometimes  seem so obvious
⊗4after the fact⊗*, even to their inventor who may have labored years
on the problem.

The same let-down  occurs when  a magician shows  you how he  manages
some magic  trick, or equivalently when an  AI researcher like myself
shows you how  his program  was really  able to  manage some  amazing
behavior.

.S(Introduce Heuristic Guidance)

Now it's clear  that this synthesis  task, exploring outward  to find
new valuable concepts, is a very explosive search process.

The standard Artificial Intelligence method for muffling an explosion
of red arrows  is to introduce purple  and green heuristic  guidance.
<2nd  OVERLAY: green/purple>:  purple strategies  highlight the  most
promising  nodes to  work on  at a  given moment, and  ⊗4wiggly green
ideas suggest  furiously⊗* one  or two operators  to try  first on  a
given node.

Recall  that each  red operator is  a rule  of thumb.   So  the green
strategies tell us, in a given situation what the best heuristics are
to apply.

From now on, when I use the term "heuristic rule", I'll really mean a
two-sided creature  like this one, <SLIDE: heurs.; a situation/action
rule, a production.  It  says that in some empirical situation,  here
are some plausible things to spend your time doing.

Here, the rule's  IF-part, or left-hand-side, was  asking whether two
concepts have some nontrivial yet unexpected overlap. If so, the rule
concludes that it's worth the trouble to isolate and explicitly study
that overlap.

.S(We Write a Theory Formation Pgm)

The basic idea to remember from all  this is that simple mathematical
discovery  might be  successfully represented  as a  heuristic search
procedure: <SLIDE:  3  tasks> a  continual  expansion  of a  tree  of
concepts and relationships, guided  by some judgmental criteria, some
informal rules of thumb.

To  test this  hypothesis, I wrote  a computer  program, called A.M.,
which proposes  new concepts  in simple mathematical  domains.   This
program  goes  through the  same  kinds  of  concept exploration  and
development that the hypothetical  undergraduate researcher did.   AM
builds up a model of  some simple parts of mathematics.   Let me just
pause  a moment and emphasize  that AM is writtn  and is running, and
has managed to discover  all the concepts  and theorems which I  have
mentioned so far, and the ones I will mention.

AM is a heuristic search program, and in the tradition of AI programs
of  this form,  there  are three  separate things  I should  tell you
about: the same things you always have to do to implement a heuristic
search:

.BEGIN INDENT 0,3,0 PREFACE 0

(1) specify the initial core of knowledge that it will start with,

(2) specify the legal ways of adding to that knowledge

(3) collect the strategies and tactics that experts in that field use
when trying to form theories.

.END

At a  higher level,  the "goal"  of  AM is  to find  interesting  new
concepts    and    conjectures,   by    exploring    a    space    of
partially-investigated concepts.

.S(Mathematical Theory Formation)

If anyone asks at  the end, I'll discuss why mathematics  is an ideal
field in  which to try out  these ideas.  For  now, assume that we've
settled on very elementary math.  Before carrying out these  3 steps,
I should at  least mention what I mean by  a ⊗4mathematical concept⊗*
and a ⊗4theory⊗*.

A  mathematical theory <SLIDE: thy> is  built on a ↓_⊗2FOUNDATION⊗*_↓
of primitive,  undefined ↓_objects_↓,  and  ↓_operators_↓, plus  some
postulates  and  axioms  which  interrelate  them.    Each  of  these
qualifies to be called a mathematical ↓_concept_↓.

Onto this foundation is built a tower of theorems -- statements which
are  implied by  the  foundation  --  plus some  definitions  of  new
concepts in terms of preexisting ones.

This  is the final,  breath-taking architecture  that we see  in math
textbooks.  ⊗4The foundation is laid carefully⊗*, and the  edifice is
built smoothly, brick by brick, with no mistakes, no backtracking.

What  they  ⊗4don't⊗* show  you  in  textbooks  and journals  is  the
unaesthetic  scaffolding that was required to  erect that tower: that
scaffolding is empirical induction and good old-fasioned search, with
guessing, intuition, back-tracking and everything.

At the very elementary  levels, at least, mathematics is an empirical
science.

Most of the mathematician's empirical forays are fruitless, but there
is no way to know  this before carrying them out.   His only guidance
are  his intuitions about  what to  investigate next, his  purple and
green heuristic strategies.

This same character of searching  a large space also carries over  to
AM.

<3tasks,  revisited SLIDE>  Recall that  there is  a standard  set of
tasks to perform, to create such a system.

.SELECT 1

.S(Writing AM)

The first  step was to  list all the  concepts the system  would know
about initially. That  is, a bunch of primitive notions which are the
starting state of the system's knowledge.

<SLIDE concept> Here are most  of the concepts that AM started  with.
For reasons  I'll go  into if  anyone asks,  I chose  a small  set of
elementary, pre-numerical concepts.

Included   here  are   static  structural  concepts   like  sets  and
ordered-pairs,    and    active    ones    like    union,    reverse,
compose-2-relations, repeat, and so on.

Note that there is no notion of numbers or proof.

At  a  slightly  deeper  level,  we  had  to  decide  precisely  what
information the system would be told about each of these concepts.

Certainly we had to provide a ↓_definition_↓ for each concept, and  a
name.

For each operation, we  also mentioned its ↓_domain and  range_↓, and
gave an algorithm for computing it.

What else should be supplied initially about each concept?

To help AM decide what to do, it would be nice to supply some kind of
rough judgment  of the  worth of  each concept:  its usefulness,  its
simplicity, symmetry, etc.  Instead of grappling with the metaphysics
or this problem, I simply attached, ad-hoc, a number to each concept,
which was supposed to represent its overall worth.

If, by  introspecting, we  knew of  any heuristic  rules relevant  to
dealing with that type of concept, they were tacked onto it.

So there are many facets to each concept, <SLIDE facets>. Some of the
ones  we  didn't   mention  yet   were:  Examples,   Generalizations,
Specializations, and Analogies. Thus  by a "concept" I'll now  mean a
concrete kind  of data structure, similar to  Frames, Beings, and KRL
Units.

The set  of facets  is fixed  once and  for all.    Each concept  can
possess any or all  of these facets. But that's all  we can ever know
about a  concept.  A concept, as represented inside AM, ⊗4is⊗* simply
a bunch of facets, drawn from this fixed set.

In the LISP program implementing these ideas, each  concept is stored
as an atom,  and its facets are implemented as  the properties on its
property list.

Let's  consider  one  particular  concept,  and  see  what  kinds  of
information could  go into each  facet.   We'll consider the  concept
COMPOSE, which refers to the composition of two relations.  We define
AoB(x) as A(B(x)). That is, apply  B and then apply A to the  result.
<COMPOSE  slide>  <go over  each  facet>.    Here is  an  abbreviated
description of that concept.

Several equivalent definitions are provided. Some are computationally
efficient, and others are slow but especially easy to  for the system
to analyze.   Each definition is an  executable LISP predicate, which
takes 3 arguments  and tries to  see whether the  composition of  the
first two is equal to the third.

The  Domain/range  facet  notes  that  Compose   accepts  a  pair  of
activities, and creates a new one.

The specializations facet  points to another concept which is similar
to Compose.    This specialization  only  composes an  activity  with
itself.

An ⊗4example⊗* of Composition is provided here.

One conjecture that  AM found and stored here is  that composition is
associative.

Down here  we have some heuristics for dealing with compositions.  An
overall rating of this concept, how to spot an interesting compostion
(e.g., if the d-r of the composition are equal), hints for filling in
certain facets  of  any  new  composition  which  gets  created,  and
something worth checking about each composition.

The format of each facet is fixed, but the formats vary from facet to
facet, from productions <heurs> to little programs <algs>, to bits of
declarative data <d-r>.

Let me put  back the  list of concepts  again, and also  our list  of
facets.<SLIDE>

I provided the contents of a few of these facets by hand, for each of
the  initial  100 concepts.  AM later  filled  out the  other facets.
Whenever  AM   created  a   new  concept,   it  had   of  course   no
"hand-supplied" facets at all for it.

.S(What AM Does)

What does  the  system actually  do, now?   From  a heuristic  search
viewpoint,  the primary  activity of  the system  is to  ⊗4create new
concepts⊗*.  These  are  the  highly-visible  "legal  moves":  taking
existing  concepts and  combining  them,  conjoning them,  disjoining
them,  running them, and modifying their  definitions.  This plethora
of alternatives  is too big  to consider,  so AM  only creates a  new
concept when some heuristic rule indicates that it might be worth the
gamble. So the heuristics act like little plausible move generators.

When AM does create a new concept, most of its facets will be  blank,
so that in  practice most of the  system's time is spent  in fleshing
out some recently-discovered concept, filling in some of its facets.

In other words, there are ⊗4TWO WAYS⊗* that AM expands its knowledge.
One way is to pick an already-known concept <SLIDE: Compose>  and try
to find  out more about  it, by filling in  some facet of  it <SLIDE:
Compose.exs  overlay>. The second way AM  expands its knowledge is by
defining new concepts out of old ones <SLIDE: ∪o' overlay>.

In the LISP  implementation, that translates  to (1) filling out  the
properties  of  existing   data  stuctures  (here  some  examples  of
Compositions have been  found), and sometimes  (2) creating  entirely
new data structures (Here the system takes  one particular example of
Compose and promotes it from being just one entry on one facet of one
concept, into being a full-fledged concept module with facets  of its
own).

Since the "fleshing  out" activity is quite expensive,  there must be
some  "fine structure" to the process of  growing a new node onto the
tree of concepts.  This  is a non-standard problem, not  faced during
your  typical  heuristic  search.   We  can't  permit  the system  to
completely fill out each new node (concept) it wants to look at.

The solution I use is  to have AM maintain  a list of very  plausible
tasks, and repeatedly choose  the most promising of them  to execute.
Each task is at the level of fillng out a particular blank facet of a
particular concept. So a few  facets of some concept will get  filled
out, and the situation at that moment might then cause AM to shift to
filling in  some facets of an entirely different concept.  When a new
concept gets created, most of its facets may remain  blank for a long
period of time.

The LISP  implementation of this idea  is to have  a global job-list,
<SLIDE joblist> an agenda of  suggested activities to do, an  ordered
list of candidates for  the system's attention.  Each  entry consists
of a specific task for the system to work on, a priority value, and a
list of reasons why this is a reasonable task to do.  Except for this
list of  symbolic reasons, this is  the same as the  agenda mechanism
used in KRL, in the Dendral Predictor, and in LT.

At  some moment, there  might be  hundreds of entries  on the agenda;
this one indicates that the system spend some time filling in example
of the concept Primes.

This agenda  governs the top-level control structure  of the program.
The system picks the highest priority job off this list, and executes
it.

The priority  rating is a  function of the  values of  these reasons.
They  in  turn are  proposed  locally, by  the  heuristic  rule which
suggested this job in the first place. We'll see this in  more detail
in a minute.


In the process of performing a job, 3 kinds of things can happen:

.BEGIN PREFACE 0 INDENT 3

(1) blank facets  may get filled in.   In the case of  this task, the
Examples facet of the Primes concept would hopefully get filled in.

(2)  new  concepts may  get  created.   Up  here, when  this  task is
executed, the resultant gnneralizations will be full-fledged concepts
themselves.

(3) new tasks may get  added to the agenda. After filling in examples
of primes, they might be so rare that the system would add a task  of
the form "Generalize the concept of Prime numbers".

.END


<SLIDE: 3 actions> Each of these activities  is guided by a heuristic
rule. I'd  like to take a minute and  illustrate how a heuristic rule
can manage each of these kinds of actions.  The basic idea is  that a
heuristic rule  triggers, because  its left side  is relevant  to the
current  state of the world.   The right side of  the rule contains a
list of  actions to  perform, and those  actions are  of these  three
types.

.S(Action 1: Fill in entries of facets)

The first  kind of  action that a  heuristic rule  can direct is  the
filling  in  of specific  pieces  of information  about  a particular
concept.

For example, here's a rule <SLIDE: fill exs> which gives one strategy
for  filling in  examples of  any  concept C.    It says  to find  an
operation  whose range is C, then find  examples of the domain of the
operation, and simply run that operation and gather up all the values
it returns.  Those values should then be examples of C.

Here  is how  it might be  used to  fill in  some examples  for Sets,
assuming a few were already known. It would say to find an  operation
whose range is Sets.  One such known operation is Union. Then it says
to run that operation, and the results will be examples of Sets. Sure
enough, if we do that, we do produce lots of sets.  Some of them will
be sets which weren't known before.

A similar kind of heuristic rule is used to find conjectures: Here is
↓_a  rule  <SLIDE:  find  conjec>  which  has  AM_↓  ask  if a  known
generalization of concept  X is really equivalent  to X. If so,  this
gets  recorded, as  an entry  on the  Conjecs facet  of  the concepts
involved.  This is the only way that AM noticed any conjectures: when
specifically directed to do so by some heuristic rule or other.

.S(Action 2: Creating new concepts)

<SLIDE: remove overlay from 3 actions> We've  seen how heuristics can
add entries.  How can they cause new concepts to be created?

Consider this heuristic: <SLIDE: Create-con>. It says to look for the
inverse image of extreme elements under an interesting operation.

We  already  saw  how  this  heuristic  can  lead  to  proposing  the
definition of prime numbers, when the operation F is divisors-of.

The heuristic contains enough information  to define the new concept,
and  to fill in any other information  which is trivial now but might
be very  hard  to reconstruct  later  on (like  why the  concept  was
created).

.S(Action 3: Proposing a new job)

<SLIDE: remove overlay from 3 actions> The  third kind of action that
a  heuristic rule can  initiate is to  suggest a new  task which then
gets added to the agenda.  Let's see how this happens.

Consider a heuristic rule, like this one <SLIDE: New-job>.
Somewhere  among the  actions to
take,  listed here on its  right side, is  one which explicitly calls
for a new task  to be added to the agenda.   The rule says that  if a
predicate appears to be  rarely satisfied, Then the following task is
plausible: fill in some generalized forms of that predicate.   Notice
that the rule also specifies an English phrase which is the ⊗4reason⊗*
why the  task is now plausible,  and also supplies a  way to rate the
overall worth of that reason.

The precise numbers turn out not to be important.

.SELECT 1

Let's see how  this particular rule  might be used  to suggest a  new
task for the agenda.  Say  AM recently worked on finding things which
were  Equal to each other, but that  very few random pairs of objects
turned out  to be equal.   Then  this rule  might trigger, and  would
propose this  task: <SLIDE: task1>.   If this task  wasn't already on
the agenda, it would get inserted there.

But suppose ⊗4this⊗*  <SLIDE: partial  agenda> was the  state of  the
agenda at the time.  We see right here that the task is already on the
agenda, but  for a different reason.  In such  a case, the new reason
would be  recorded, and  the priority  value of  the  task would  get
boosted.  <Full agenda> Here is the way that task would look. The new
reason has raised it to the top of the agenda.

This  turned out to  be a  surprisingly important mechanism.   When a
newly-proposed job turns  out to be  ⊗4already⊗* on the agenda,  then
its reasons  are examined.  If  the job is being  proposed for pretty
much the  same  reasons  as  before,  then  its  priority  rating  is
incremented only  slightly.  But  if it is  being proposed for  a new
reason, as in this case, then its priority is increased tremendously.
⊗4This is the practical importance of maintaining reasons for each job.⊗*

AM can compare two reasons to  see if they are equal, and that's  all
it has  to do  manage this scheme.   Other than  that, there  is very
little  that AM can do to  a reason, except type  it out to the user.
Each reason is actually just a token, but as  you can imagine they're
quite useful for  keeping the human user aware of  what the system is
doing and why.

In general, the priority value of a task on the agenda depends on  the
ratings  of all  the  supporting  reasons.   The  reasons  are  rated
locally, by  the same heuristic rules which  proposed them.  A single
⊗4global⊗* formula then looks at the job and the reasons, and assigns
it a  single priority rating from  0 to 1000.   <SLIDE: Formula> That
formula  was just guessed  at as a  first pass, and  it's worked well
enough  not  to  tamper  with  it.    It   involves  squareroots  and
dot-products of vectors and isn't really worth delving into.

Each concept, each facet, each  reason supporting a job on the agenda
has a number  attached to  it.  This  formula uses  those numbers  to
attach an  interestingness factor,  a priority  rating, a number,  to
each job on the agenda.

<SLIDE: Put  back full agenda> AM looks  over the job-list, picks the
job with the highest priority, and tries to execute it.  The  flow of
control is so distributed that it's almost unpredictable.

That priority rating  of the chosen task then dictates  how to manage
the  system's resources:  it indicates the  number of  cpu seconds to
spend before giving up and trying the next task on the agenda, and it
specifies the number of list  cells that the task is permitted to use
up before it must quit, and so  on.  (This is like Danny Bobrow's idea
of Resource-limited computation.)

We are assuming that there is a  self-consistent set of computational
rules  for manipulating the estimated  worth of all  the concepts and
jobs.   This is  at  best a  first-order theory  of  interestingness.
Comparing the worths  of "equality" and "primes"  is really comparing
apples to oranges.

I  think that this scheme works  adequately here only because neither
the heuristics nor the jobs are strongly coupled; they are all pretty
much independent of each other.

For example,  it doesn't really matter  which order the  top few jobs
get executed in; no one cares whether we look for primes right before
generalizing equality, or right afterwards.   What ⊗4does⊗* matter is
that  they are  near the  top, not  near the  bottom, of  the agenda.
That's why a crude kind of evaluation function is all you need.

.SELECT 1

.S(Locating Relevant Heuristics)

<SLIDE: where we are> We've seen 3 kinds  of actions a heuristic rule
can  take.  Furthermore, we've  seen how the  agenda mechanism always
keeps AM  working on  a  plausible task.   But  we're not  really  in
business until I  explain one more thing. Once a  task is chosen, how
are  the  right  heuristic  rules located,  the  ones  which  will be
relevant to the task just chosen?

There are about 250 heuristic rules altogether in AM; that's too many
to simply evaluate the left sides of ⊗4all⊗* of them.

After explaining this, the overall behavior of AM will be described.

Like a human mathematician, AM should only access a strategy in those
situations where  it could  plausibly be  used.   By looking  at  the
heuristics,  it seemed  that each  one  has a  very  clear domain  of
applicability, of relevance.


For  example,  <SLIDE:  compose  heur>  one  heuristic  says  that  a
composition FoG is  interesting if it preserves  some of the  unusual
properties of F and G. This is  a useful thing to know when trying to
judge the  interestingness of a composition, but it won't help you if
the current  task is  to check  examples of Lists.   We  may as  well
associate this heuristic with the concept Compose.  We'll record this
heuristic rule  inside the  Interest facet  of the  Compose  concpet.
That's what this orange line means.

Other heuristics are  a little bit  more general, they relate  to any
operation <SLIDE:  Operation heurs overlay>, so  they would be tacked
onto the concept Operation.  This one is an entry on the Fillin facet
there.  This  one is on the Interest facet  of the concept Operation.
It says that any operation is interesting if its domain and range are
identical.

As AM was written, I gathered heuristics by introspection, and tacked
them onto the concept I  thought they were most relevant to.  <SLIDE:
Genl branch> The more general the  heuristic rule, the weaker it  is,
typically.  So each individual heuristic rule is tacked onto the most
general kind of entity  to which it applies.  Whenever I thought of a
heuristic, I pushed it as high as possible in this tree  of concepts.
In fact,  half of AM's heuristics  are tacked onto these  two topmost
concepts.

When  dealing with any concept X, all  the heuristics attached to any
generalization of X, that is, any higher node, are also considered to
be relevant to X.

Say AM wants to see how interesting this particular concept is.  Then
AM could  use  evaluation  heuristics  gathered  from  any  of  these
concepts.  Any reason why an operation might be interesting will also
be valid here.

The  system  would  ripple upward,  in  the  direction of  increasing
generality,  collecting  heuristics  as  it  went.    They  would  be
evaluated, and  their weighted average  result would be taken  as the
estimated interestingness of this concept.

In   fact,  <OVERLAY>  ⊗4these⊗*  features   are  satisfied  by  this
operation, so  they  combine to  indicate it  is  a mildly  promising
concept.  Notice that the further upward we go, the less specific the
heuristics get, and the less useful they are.

If AM wanted  to fill in  generalizations of this  concept, it  would
ripple   upwards  again,   gathering   heuristics  for   filling   in
generalizations.   Any technique for  generalizing an operation would
certainly work here too.

The same kind  of rippling  would collect relevant  heuristics if  AM
wanted to fill in examples of a certain concept, or check the entries
of the domain/range facet of this concept.


Since  the  heuristics  attached  to  these  (specific)  concepts are
typically more powerful  than the heuristics  way up here, AM  always
executes  the  heuristics  in  the  order  they are  collected.    No
sophisticated reasoning is done  to order them in  some way prior  to
obeying them.

This works  [in practice]  because the  heuristics rarely couple;  at
most, they superimpose their effects; in either case the ⊗4order⊗* in
which they are executed  is not too  important.  They don't  interact
much.  Simple schemes like this  only work when the components of the
system  are  very independent  of each  other,  when the  problem was
easily   decomposable  or   factorable   to   begin  with.      Tight
interdependence would require a more sophisticated approach.

.S(Cardinality Example)

Let's  finally  take a  look  at  the  program in  action.    <SLIDE:
Cardinality slide>


.SELECT 1

This  is a  condensed excerpt  from a  session with  AM, as  it first
discovers concept  of numbers.   AM  already knew  about sets,  bags,
lists,  etc., and  equality of  2  sets.   A  bag is  a multiset,  an
unordered structure with repeated elements allowed.

I'll read  through it once to show you that AM really performs magic,
then we'll peek behind the curtains to see that AM  is really no more
than  the  straight-forward data-structure  expander  that I've  been
describing.

:: After  testing random  structures to  see if  they are  equal,  AM
concludes that  it would  be worthwhile  to generalize the  predicate
EQUAL. The user asks "why" and AM mentions that few randomly selected
objects turned out to be equal to each other.

:: Two generalizations of EQUAL are constructed.  The first turns out
to be  the relationship "has the  same length as", and  the second is
"has the same first element as".

::  After   testing  random  structures  to  see  if  they  have  the
SAME-LENGTH, AM  decides to  try to  find  a canonical  form for  all
structures.   This would  mean that two  structures were of  the same
length if and only if their canonical forms were identically equal.

:: AM comes up with the awkward idea of making this conversion by (i)
converting each structure to  a BAG, and (ii) replacing  each element
by the constant T.   Thus the canonical form of our alphabet would be
a bag 26 T's.  AM  decides to restrict various operations like  Union
and Intersection to such canonical structures, and see what happens.

The user  tells AM  that such canonical  structures should  be called
NUMBERS. Those  restricted bag-operations will turn into what we call
arithmetic.

Well, now that you've seen what AM says, let's see how it does it.

.S(Finding Examples of Equality)

Recall that  Initially, I supplied  by hand a  definition or  two for
each of the  115 concepts AM started with. Each such definition was a
LISP predicate. I also gave each concept a rough numeric Worth value.
Each concept  which was  an operation also  received an  algorithm or
two,  and a statement  of domain/range.  Finally,  I introspected and
collected all the  general heuristics I  could, and tacked them  onto
the  appropriate concepts. Most  of these  heuristics applied  to the
general concepts, like  Operation. Very few  applied to the  specific
concepts, like Equality or Reverse.

Equality was one  of the initial  base concepts. It originally  had a
very low  Worth rating.  AM began to notice,  e.g., sets all of whose
members were equal,  and gradually  boosted Equality's Worth  rating.
When Equality became  valuable enough, a general  heuristic rule said
to  consider filling in some  examples of it.  This  was added to the
agenda, and eventually AM got around to that task.

How in fact  did AM find examples  of objects which were  EQUAL, once
that job was  chosen?  The job said to fill  in the EXAMPLES facet of
the EQUAL concept.   To do that, AM  gathered heuristics by  rippling
outward <SLIDE: Genls(Equality)> from Equal.

There were no heuristics located here, but some were found on each of
the higher, more general concepts.

One  of these (Any-concept)  rules of  thumb said to  instantiate the
definition of the predicate EQUAL.   Three definitions of Equal  were
supplied by me initially.  Here is one recursive definition of EQUAL:
<defn  of EQUAL  slide>.   AM possesses  several syntactic  rules for
instantiating LISP  predicates,  which is  all  that this  definition
really is.

An easy way to instantiate  this, to satisfy this, would be simply to
satisfy this  base step.   To  do  that, the  two arguments  must  be
identical atoms.  So  some examples like NIL=NIL, T=T,  are produced.
Also,  AM inverts this  recursive step,  by plugging in  here objects
already known to be equal.   But AM just  found that NIL=NIL.  So  if
the CARs  of 2 structures  are both T, and  their CDRs are  both NIL,
then  the structures  will be Equal.  So an  example like  (List T) =
(List T) is generated.

An entirely  different approach  is suggested by  a heuristic  tacked
onto the Activity concept.  It says that, when filling in examples of
any activity, including a predicate, we can randomly instantiate  the
arguments, run the concept, and observe the  results.  So AM accesses
the  domain/range facet  of EQUAL,  sees there the  entry: a  pair of
objects gets mapped into True/false. 
So AM accesses the Examplls facet of Object, and acquires a huge collection
of objects.
Then AM repeatedly  picks random
pairs  of these objects  and sees  if they  satisfy EQUAL,  by running  the
little program  stored in the Algorithm facet of Equal.  This is where
the empirical  data comes  from that  tells  AM how  rarely EQUAL  is
satisfied.

We already  saw the heuristic that said  to consider generalizing the
definition of a predicate, if very few  examples can be found.  So  a
new  job  is added  to  the  agenda,  which says  to  generalize  the
definition  of EQUAL.   Since it  has a good  reason, it gets  a high
priority value, and it soon rises to the top of the job-list.

⊗4Note⊗* that  the  techniques  for  filling in  all  those  examples  of
Equality were drawn  from very general concepts: ways to  fill in any
concept,  ways  to  fill  in  any  active  concept.    There were  no
special-purpose tricks,  no  specialized  heuristics for  filling  in
examples of Equality.

Similarly,  AM used very  general techniques  for noticing  that this
predicate should be generalized, and for generalizing its definition.
AM in fact  was given only a  few syntactic rules for  generalizing a
LISP   predicate,  and   therefore   some  semantically-uninteresting
generalizations of Equal  were created.   

.S(Generalizing Equality)

Assume that  AM has plucked  that job off  the agenda, the  one which
says  to generalize  the  definition of  EQUAL.   AM  magically turns
Equality into Equal-length.

It  does  this  using  the  syntactic  rules  for   generalizing  the
definition of  any concept.  One  rule says to remove  a conjunct, so
that the new predicate would recur in only one direction.

In particular, by doing away with the test in the CAR direction <defn
OVERLAY slide>, we get a  predicate which counts down two  lists, and
returns T  if and only  if they both  become empty at  the same time.
That is, iff they have the same length.

A few other concepts were generated which were boring generalizations
of Equality. There are  two reasons why syntactic rules  suffice: (1)
AM never generalizes X unless it has a very good reason, so this is a
very rare occurrence;  (2) very quickly,  AM will notice  interesting
relationships  involving some  of  these, and  lose  interest in  the
other, boring ones.

<SLIDE:  User sees it again> The second  magic trick was deriving the
canonical  form  of   a  structure.     AM  did  this  by   selective
experimentation, which I'll describe later if anyone asks.


This  whole development  was done  to satisfy  just 3  jobs  from the
agenda list:

.BEGIN FILL INDENT 4,0,0 PREFACE 0

(Fillin Exs. of Equality)

(Generalize Defn. of Equality)

(Canonicalize Same-length wrt Equality)

.END


This example is really compressed by a factor of 10; it distorts AM's
abilites  and limitations.    Let's move  at  a faster  pace  through
another, longer, but more faithful excerpt of AM in action.

.S(Example 2)

.BEGIN TURN ON "∞→"

After  going through  this  final  example,  I'll just  describe  the
overall results of this project.

We'll start in the middle of a sessions with AM, after 64 other tasks
have been  chosen and executed.  AM has  already discovered  numbers,
arithmetic, squaring, square-rooting, odd  numbers, and several other
elementary  objects  and operators.    The concept  of  Divisors-of a
number has just been defined.

<SLIDE: Ex2>

Task 65: AM was interested  in the concept of Divisors-of,  and found
several examples of it.

66:  The  inverse  images of  various  extremal  kinds  of sets  were
considered.  AM noticed that numbers-with-3-divisors all seemed to be
perfect squares.  How  did it notice that relationship?   It took one
example of  these numbers, 49.  AM then asked  what 49 was an example
of,  besides  Numbers-with-3-divisors.    The  answer  was:
odd-number,  perfect-square.   So one  hypothesis was  that all  such
numbers  might be perfect  squares. This  was tested on  the other 10
examples of of Numbers-with-3-divisors, and sure enough they all were
squares.     So   AM  conjectured   that  Nos-with-3-divisors   is  a
specialization of the concept Perfect-squares.

⊗3↓_∞ → _↓⊗*

67:  The   square-roots   of  those   numbers   turned  out   to   be
numbers-with-2-divisors.   Which the user  then renamed  as "Primes".
The value of both concepts was raised.

⊗3↓_∞ → _↓⊗*

79:  Later,  AM   examined  the  inverse  of  Times.    For  example,
Times↑-↑1(12) is the set  of all possible ways  of factoring 12.   It
contains  the bag  (2 2  3),  since 2x2x3  is  12.   AM noticed  that
Times↑-↑1 of each number seemed to contain a bag of primes.  So a new
concept was  created, a  new relation which  maps a  number into  all
possible factorizations of that number into primes.

⊗3↓_∞ → _↓⊗*

80: Lo and behold, that relation turned out to be a function. This is
the fundamental theorem of arithmetic.

⊗3↓_∞ → _↓⊗*

84: Lest you think that AM was always this brilliant, let's  continue
and look at one of the next things it  did. It examined the inverse of
Addition.  For  example, Add↑-↑1(6) contains the bag (1 1 2 2), since
1+1+2+2 equals 6.   
This is related to the Partition function.
One of many conjectures  AM notices then is  that
Add↑-↑1 of a  number often contains a pair of  primes. A new relation
is  created, which takes a number and  returns a list of all possible
ways of representing it as the sum of two primes.

⊗3↓_∞ → _↓⊗*

106: Eventually, AM studies  this, and notices that all  even numbers
seem to  have at least one  such representation.   This is Goldbach's
conjecture.   AM  doesn't rate  this  conjecture very  highly,  since
primes are not intimately connected with addition.

107: Here, AM isolates  those number which can be  represented as the
sum of 2 primes in only one way. While this is a novel set of numbers
to investigate, AM wasn't able to find anything interesting out about
them.

⊗3↓_∞ → _↓⊗*

.END

.S(Results)

Now that we've seen a  few examples of AM in action,  let's step back
and look at the whole run, of which these were just excerpts.

This  run  occurred  a few  months  ago.   AM  started  with  the 115
prenumerical concepts  I  showed you  earlier,  with about  8  facets
filled  in   for  each:  typically,  the   name,  definition,  worth,
domain/range, algorithm, fillin heuristics, checking heurisics, and a
pointer to a known generalization.

AM worked completely  by itself for  1 cpu hour,  when it ran out  of
space and  died.  AM  had filled in  a total of  200 previously-blank
facets on these primitive concepts, and it had created about 135  new
concepts. The  new concepts  ended up  with an  average of 10  facets
filled  in.  Half of  those new ones were  not really worth defining,
and were technically  called losers, but  the rest were  interesting.
<SLIDE: Winners>. Note AM went through the whole chain of discoveries
I  talked  about during  this talk,  all  the way  from set-theoretic
notions to numbers to  unique factorization. Several new  concepts of
questionable value were derived.

AM did  this all on its  own; if I  sit at the keyboard  and guide it
along, the same  discoveries can  be made in  about a  third the  cpu
time.  <Results again: out of order!>

200 jobs from the  agenda were executed during the cpu  hour AM used.
Each  job was allocated from  1 to 100 cpu  seconds, depending on its
priority value.  Most  jobs were granted  about 30 seconds, but  used
under 20.

The  1-hour run  ended  because  of space  problems,  and because  AM
couldn't  fill  in  very  good heuristics  for  the  new  concepts it
created.  As  it got  further and further  from its initial  starting
basis, the methods  it was able to bring to  bear on its new concepts
got relatively weaker and weaker.   For example, it was dealing  with
primes using set-theoretic heuristics.

To correct this, AM  would need a body of  good meta-heuristics; that
is,  rules for finding and  fiilling in new rules.  At the moment, it
doesn't have any.   Since most heuristics  are based on hindsight,  I
would  guess that  most of  the meta-heuristics  couldn't be  applied
until  AM has  already made  most of  the possible mistakes  it could
make.  I'm  not sure, and  I leave this  as an open research  problem
<SLIDE: open problem>.

All was not perfect, of  course. Several tasks were chosen at bizarre
moments, and many poor tasks  were chosen. <SLIDE: Losers again>  One
danger that AM  is prone to is  that of regular or  looping behavior.
Any kind of repetition should -- but doesn't -- warn AM to be careful
and look for a  unifying generalization of  what it's doing, to  rise
above the  infinite loop it's fallen  into.  That's how  this concept
arose  (ComposeoCompose...).  Similarly,  AM repeated multiplication,
then repeated exponentiation, then repeated hyper-exponentiation,...

Although any specific example I  give you might have fixed  by adding
one  simple  rule,  the  general  problem  of  detecting  regularity,
infinite loops, is quite difficult and  I leave that as another  open
problem.

IF ASKED: A HANDLE ON THE SOLUTION IS:

.ONCE INDENT 5,5,5

By   investigating  each   concept  it   defined,   and  basing   its
interestingness judgments on those results, AM was able to avoid most
such traps.

.S(Experiments With AM)

One  important  reason  for actually  constructing  AM  was  that  of
⊗4experimentation⊗*: one  can vary the concepts  AM starts with, vary
the  heuristics available,  etc.,  and  study  the  effects  on  AM's
behavior.  <SLIDE: expts> Several such experiments were performed.


One involved adding a couple dozen  new concepts from an entirely new
domain:  plane  geometry.    AM  busied  itself  exploring elementary
geometric concepts, but was not quite as  productive there  as in  its
original  domain.   New concepts  were defined,  and  new conjectures
formulated.  For example, AM had the notion of identical equality  of
lines  and  of  triangles, and  generalized  these  into  similarity,
parallel, congruence etc.  It noted many new concepts, like triangles
sharing a  common side  and  a common  angle. AM  also found  a  cute
geometric  interpretation  of  Goldbach's conjecture,  which  it  had
proposed  earlier:<SLIDE: prime angles>  Any angle, from  0 to 180↑o,
can be approximated (to within 1↑o) as the sum of two angles of prime
degree.   If our  culture and  technology were different,  this might
have been an important, familiar result.  Incidentally, much  sharper
results and much more general ones were also obtained.

<SLIDE:  expts again>  Several  of  the  experiments on  AM  involved
de-tuning  the system,  e.g.  radically altering the formula  which assigned
each task  a priority  rating.   Or: radically  altering the  initial
Worth  numbers  of  the  concepts.    Generally,  AM's  behavior  was
degraded, but overall it was much more robust than anticipated.

An  interesting result was that AM  uses precise numeric values very
rarely, only in very blind situations, where no decent reasons exist.
When AM  enters a chain of  discoveries, it's sustained  by masses of
reasons, not by nuances in their numeric ratings.

Another unexpected result  of this  experiment was that  the top  few
jobs on the agenda  almost always suggest new high-priority  jobs, so
that a job sitting  in tenth place on the agenda at some moment has a
very low chance of ⊗4ever⊗*  being chosen, unless of course some  new
reasons for it emerge.  So  even if you had a kilo-processor machine,
executing  all the  tasks on  the agenda simultaneously,  it wouldn't
really speed up  the rate  at which AM  makes great discoveries  more
than by a single factor of  ten.  Not only wouldn't it help, it would
reduce the user's opinion  of AM's rationality: a  lot of that  stems
from AM's choice of which task is best at each moment.

Let me move on to another kind of experiment  that was performed.  It
was expected that  a few key concepts (e.g.  Equality) and heuristics
(e.g., define  "f" as  repeated "g"-ing,  where g  is an  interesting
operation)  would have  a tremendous  impact on  AM's behavior:  that
removing  them  would destroy  most of  AM's  discoveries.   This was
partly  true,  but  a  surprising  effect  was   noticed  when  these
experiments were actually done last week. Many of the very common and
valuable concepts manage to be rediscovered in many separate ways, so
that the absence of  any single concept or heuristic  isn't critical.
Multiplication  arises in four  separate ways,  so one would  have to
remove many concepts to prevent its being noticed.

Several more experiments have been planned for the near  future. This
Fall, I intend to add about twenty new concepts, dealing with proof
and  proof  techniques, to  see how  the  reasons for  conjecturing a
theorem might aid in its subsequent proof.

To me  personally, this whole  businss of  experimenting with AM  has
been the most exciting aspect  of the project.  It's like using AM as
a tool, to explore the process of discovery -- at least discovery  in
very elementary mathematics.

.S(Maximally-Divisible Numbers)

While working on its own, AM has not discovered any mathematics which
is  new and publishable,  although it has  done so when  working as a
co-researcher with a human. AM noticed simple new concepts which most
humans had  overlooked, but AM  by itself  was not able  to precisely
formulate interesting statements about those concepts.

Besides the cute Goldbach interpretation just described, AM motivated
the development of  a new result in  number theory.  I'll  close this
slide show with one slide about it. <SLIDE: AM conjec>

Just  as  AM  decided  to  look  at numbers  with  extremely  ⊗4few⊗*
divisors, it  thought  to  look at  those  with  abnormally  ⊗4many⊗*
divisors.   For any  number d, the  smallest number  with at least  d
divisors  is at least this  big. If its's factored  into primes, then
their exponents decrease inversely as the logs of the primes.  

For example,  here's the smallest number with 6 million divisors.
The AM  conjecture predicted that  this number n
would be this big,  so it was off  by less than a  factor of two,  so
it's  a  very  sharp  bound.   Erdos  recently  showed  me  the  best
previously-known  formula bounding n(d),  and it  could only gurantee
that n would be bigger than this, so it was off by more than  a whole
order of magnitude.

⊗4AM  made this  definition⊗*,  found examples  of such  highly-composite
numbers,  and then suggested something like this relationship. Later,
by hand, I  was able to derive  these formulae, and with  Erdos' help
this one.

A couple  months ago, I found  out that Ramanujan  defined these same
numbers and found a  similar regularity to  this one.   As far as  is
known, this particular formula is new.

It was AM which expected that this kind of maximally-divisible number
would be very interesting.  I thought I knew better, but in this case
AM really was right.

In dozens of other instances, AM seemed to me to be going nowhere and
in  reality it was actually  going nowhere. It  outsmarted me in this
cute way only twice.

.S(Conclusions)

What is the final conclusion of all this?

AM is forced to judge ⊗4a priori⊗* the value  of each new concept; it
⊗4must⊗*  quickly lose  interest in  concepts  which aren't  going to
develop into anything.   Often, such judgments  can only be based  on
hindsight.   

For similar reasons,  AM has difficulty  formulating new
heuristics  which are relevant  to the  new concepts it  creates.  It
needed good meta-heuristics, and I don't even know  whether any exist
for this domain.

While AM's "approach" to empirical research may be used in other very
hard scientific domains, the main limitation (reliance on  hindsight)
will probably recur.  This prevents AM  -- and future similar systems
-- from progressing indefinitely far on their own.

Before this  ultimate limitation was reached, AM  had in fact carried
out some  math  research.   The research  was  of a  very  elementary
character, and  the same program  may not be  able to carry  out much
more sophisticated theory formation, but nevertheless we can conclude
that certain  kinds  of hack  discoveries,  certain routine  research
tasks,  can  be represented  adequately  as  a rule-guided  heuristic
search.

.SKIP TO COLUMN 1

*********************************************************************

Ideally, one could run AM overnight, and wake up the next morning  to
find five or  ten more or less  interesting new concepts to  look at.
It  would  still   be  the  researcher  who  made  the  highest-level
selections of  what  to do  next,  much  like a  professor  directing
graduate students.  This is the ultimate use that I hope such systems
will be put to -- probably in the next centry: to work synergetically
alongside a  researcher,  to  free him  from  doing all  the  routine
detective work.

.S(Supplement)

↓_ Finding canonizing function for numbers_↓

Experiment number 1  was to  convert the arguments  from one type  of
structure  to  another,  and   see  if  that  changed  the  value  of
SAME-LENGTH when it was applied to them.  It turned out not to, which
meant that one single canonical type of structure could be chosen.

But there  are 4  kinds it  could be:  a set,  a list, a  bag, or  an
ordered set.   The particular kind of  structure depends on 2 things:
whether altering the order of the elements in the arguments makes any
difference to SAME-LENGTH, and whether adding multiple occurrences of
the same element makes any difference to SAME-LENGTH.

But  changing  the  order  makes  no  difference,  so  the  canonical
structure type  must be  unordered, either BAG  or SET.   But  adding
multiple elements ⊗4does⊗* make a difference, so the type must be BAG
or LIST.  The only possible canonical type is therefore BAG.

Next, AM notices  a message in  the Ties facet  of SAME-LENGTH,  that
explicitly  says  that  it's  the  same   as  Equality,  except  that
SAME-LENGTH does not recurse in the CAR direction. The CAR of an atom
is its  value cell, so  that message  is enough  evidence to  warrant
testing whether or  not the value cells of any  of these elements are
ever  accessed.  Empirically,  they never  are. So AM  assumes it can
replace them all by a single, fixed value: say "T".

Putting this all together,  we see the canonizing  transformation as:
convert the structure to a BAG, and substitute T for each element.

**********************************************************************

Another thought I hope you'll leave with today is that (for this task
at least) if the heuristic operators are sophisticated enough, then a
simple numeric calculus is all the meta-level heuristics you need for
adequate   performance.      And   you   don't   need   any  explicit
meta-meta-heuristics at all.  It would be interesting to  see if this
is really true  for other tasks as well.  I  suspect tht this is true
whenever the heuristics are fairly independent of each other.

One problem we all  face is how  to cope with  changes in a  system's
knowledge  base. The  standard  solution is  to  track down  all  the
effects of each change, and update the whole data base.  The solution
AM uses,  and  perhaps people  too, is  to  just record  the  changed
information, and ⊗4not⊗* to update  everything.  When a contradiction
of  some kind occurs, then the  conflicting knowledge is checked, the
knowledge ⊗4it⊗*  was  based  on is  checked,  and  so on,  until  we
encounter some  changed opinion which explains the  disagreement.  So
the system is allowed to hold contradictory viewpoints, so long as it
could resolve any  dispute if it  had to.   This is one  solution you
might  keep in mind  for your  knowledge-based systems. I  think that
⊗4this⊗* works because there are so few contradictions; that  in turn
is due to the independence of the resoning involved.